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(57) ABSTRACT 

Method and apparatus for storing, retrieving, and processing 
customer-oriented data sets in which relatively large sets of 
customers and their transactions or the like may be quickly 
and efficiently analyzed. A multi-dimensional access struc- 
ture is utilized in which each cell representing one dimen- 
sion element or a combination of dimension elements may 
include a list of customers who have made purchases or 
other transactions associated with that cell. Each customer 
record in a list may summarize predetermined information 
pertaining to that customer's behavior in the cell. Such 
records may be sorted by customer number to enable effi- 
cient combining of multiple lists. In applications wherein the 
number of such cells is prohibitively large, only a portion of 
these cells may actually include materialized lists. 
Additionally , a u ser-specified subse t of ^ t he cells naay b e 
efficient l y populate d fr om input da ta^ a nd lists which are_jiot 
nlStenalized may be materialized on demand from other lis ts 
in an efficient manner. 



30 Claims, 4 Drawing Sheets 



22 







ST01 


S102 


S1D3 






O 


y 


\ 


PROOt 




o 


o 




PROD 2 


/ \ 


\ 


o 


O 



24 

(I09,33%.$SS) 
(231, 14%. $41} 
(238, 9%. $10) 
(4SS,99%,$999) 



(I0I,S6%.$S00] 
(I09,4S%.$4SO) 
(23S,I0%,$I2} 
(S8T, 96%,$986) 



7 



(101,35%,$ 25) 
(I03,66%,$98) 
(231,42%,$ St) 
(238 , 76%, $91) 



28 



7 

30 



(I0I,22%,$2S) 
(23l,37%,$42) 



7 



26 

(10 1. 23% 450) 
(IIO.S5%.i75) 
(23a,l2%,$3S) 
(553.31%, $65) 




(101.25%, $37) 
(238, 14%, $21) 
(S53. 30%,$S0) 



32 



7 



34 



06/04/2004, EAST Version: 1.4.1 



U.S. Patent 



Jan. 30, 2001 Sheet 1 of 4 



US 6,182,060 Bl 



Fiai 





(IOi,S6%,$SO0} 
(109,45%. $450) 
(258,10%.$ 12) 
(587, 98%,$986} 



7 

28 



(IOI,3S%,$2S) 
(103.66%.$98} 
(231, 42%, $31) 
(238 ,76%.$9I) 



7 

30 



24 

, ( 

(109, 33%, $55) 
(231, 14%, $41) 
(238 , 9%. $10) 
(45S,99%,$999) 



(I0I,22%.$25) 
(23l.37%,$42) 



7 

32 



26 

(I0I,23%.$50) 
(II0.55%,$75) 
(236, 1 2%, $35) 
(553, 31%. $65) 



(101,25%, $37) 
(238, 14%.$2i) 
(553. 30%, $50) 



7 

34 




06/04/2004, EAST Version: 1.4.1 



U.S. Patent Jan. 30, 2001 sheet 2 of 4 



US 6,182,060 Bl 



FIG. 3 



S3 



S4- 



USER DEFINES THEIR DATA MODEL 

THIS INCLUDES Sf€CIFYING DIMENSIONS 
WITH HIERARCHIES, SPECIFYING WHICH 
INTRA-ANO INTER-CUSTOMER FiaOS 
ARE TO BE STORED IN THE LISTS, AND 
WHICH LISTS ARE TO BE 
MATERIALIZED IN THE INITIAL LOAD. 
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METHOD AND APPARATUS FOR STORING, 
RETRIEVING, AND PROCESSING MULTI- 
DIMENSIONAL CUSTOMER.OREENTED 
DATA SETS 

CROSS REFERENCE TO RELATED 
APPUCAnON 

This application is based upon a provisional application 
Scr. No. 60/043,597. filed Apr. 15, 1997. 

BACKGROUND OF THE INVENTION 

The present invention relates to a method and apparatus 
for storing, retrieving, and processing customer behavior 
data, account information and the like from a multi- 
dimensional viewpoint. 

Two types of systems for handling data representative of 
customers, accounts or the like may be utiUzed, that is, 
On-line Analytic Processing ("OLAP") systems (both pro- 
prietary and relational database versions) and systems spe- 
cifically designed for database marketing. 

OLAP systems can be viewed as an extension of a 
spreadsheet paradigm in which a spreadsheet is a two- 
dimensional view of a data set. For example, product 
identification (ID) may be arranged on one axis, time on the 
other axis, and sales as the entry in the data cells. Multi- 
dimensional database systems may generalize such arrange- 
ment to allow more than two dimensions. For instance, in the 
previous example, in addition to product ID and time, 
geographical location may also be arranged as a third 
dimension. 

There are a number of products which may present users 
with a multi-dimensional view of their data. Such products 
may fall into two groups or systems: those that actually store 
the data using multi-dimensional data strucUires (arrays and 
generalizations of arrays) and those that store the data in a 
relational database system. The former class of system may 
be referred to as "MOLAP" (for Multi-Dimensional OLAP), 
while the latter may be referred to as "ROLAP" (for Rela- 
tional OLAP). Both systems may answer queries about the 
contents of cells in a logical multi-dimensional space, which 
is simitar to asking for the contents of a given cell in a 
spreadsheet. Additionally, they may enable questions to be 
addressed regarding columns and rows by mathematical 
computation of columns and rows. For example, it may be 
desirous to obtain sales by product over all time periods, or 
sales of all products on a particular date. Further, in both 
systems, each cell may store a single number or a small set 
of numbers. 

In these systems, the use of "hierarchies" for dimensions 
may be employed. As an example, consider the "time" 
dimension. In such dimension, days may be the lowest level 
in the hierarchy, followed by weeks as the second level of 
the hierarchy, followed by months as the third bierarchial 
level, and a fiscal year as the highest hierarchical level of the 
time dimension. As another example, consider geography. 
Here, stores may form the first hierarchical level, followed 
by districts, regions, and countries. Such use of hierarchies 
for dimensions may facHitate the ease of use of the system, 
as the data is organized in a logical, user-oriented marmer. 
Additionally, such hierarchies may provide structural infor- 
mation to the system itself that can be used to answer queries 
efficiently. For example, if the sales of a given product by 
month are known, the sales of the product for a given year 
may be computed by summing the sales over the corre- 
sponding 12 months. Without the use of hierarchical 
information, it would be necessary to revert to the lowest 
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level of detailed data to compute the sales for a year which, 
as is to be appreciated, may be considerably slower. 

While the use of OLAP systems may be acceptable for 
certain types of applications (as, for example, in analyzing 

5 the financial performance of a business), they may not be 
acceptable for use with other applications such as those 
involving customer-oriented data sets. For example, in 
customer-oriented data sets, information pertaining to indi- 
vidual customers should be retained. If such individual 

10 customer information is omitted, it may be very difficult, if 
not impossible, to analyze the data set at a unit of individual 
customers. As a result, database marketing and other appU- 
cations may not be effectively performed or may even be 
impossible to perform. To store individual customer infor- 

15 mation in current systems normally requires that "customer" 
is one of the dimensions. However, as is to be appreciated, 
for any reasonably sized data set, such "customer" dimen- 
sion may be extremely large. 

Multi -dimensional database systems typically assume that 

^ dimension sizes will be reasonable. As such, "extremely 
large" dimensions may present a serous problem. More 
specifically, in current multi-dimensional database systems, 
a dimension size of several tens or htmdreds of elements 
may be typical, and a dimension with 10,000 elements may 

^ be considered very large. By contrast, the customer dimen- 
sion of a medium sized retailer or financial institution can 
easily reach 10,000,000 or more elements. If one uses a 
multi-dimensional database system on such a data set, 
several problems may arise. First, since the techniques used 

^ for good performance in these multi-dimensional systems 
(heavy pre -summarization and sophisticated indexing) are 
not effective, the performance may degrade such that inter- 
active use is very difficult or impossible. Second, the query 
paradigm may not fit with the analyst's goals. This mismatch 
arises because a standard way of displaying a mtilti- 
dimensional query result (a table or graph) may be of limited 
value when one of the axes has a million or more elements. 
Due to the above-described limitations, current multi- 

^ dimensional database systems may handle customer- 
oriented data sets having a relatively large customer dimen- 
sion in one of two techniques. In a first technique, individual 
customer information may be omitted, whereupon, such 
system is really a merchandise sales analysis tool rather than 
a customer analysis tool. In a second technique, the large 
number of customers (which may be 10,000,000+ 
customers) is statically segmented or arranged into a small 
number of groups, and all future analysis is based on those 
segments rather than on the individual customers comprising 
the segments. Thus, both of these techniques may lose or 
obliterate individual customer information which is a sub- 
stantial portion of the economically critical information that 
true customer<cntered data sets may contain. For this 
reason, a typical multi-dimensional database tool may not be 
effectively utilized for customer-oriented data processing. 

Relational database systems, on the other hand, may store 
and process relatively large data sets. However, the models 
embodied in relational database systems are typically very 
simple and generic. For instance, all data may be represented 

5Q in two-dimensional tables. Further, such models may be 
insufficient for many business intelligence applications. At 
best, a relational model of a relational database system may 
be used as a lower-level substrate upon which to build more 
sophisticated and useful model. (Relational multi- 

g5 dimensional data analysis tools are examples.) 

Therefore, neither relational database systems nor multi- 
dimensional OLAP tools may be effectively used for 
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customer-oriented data analysis. In an attempt to handle FIG. 2 illustrate diagrams to which reference will be made 

such analysis, a special purpose system tailored explicitly to in explaining a list model; 

process a large list of records may be iiscd. Such system may RG. 3 is a flow chart to which reference will be made in 

be used in database marketing apphcations. There is no explaining the loading of a list; 

multi-dimensional paradigm in these systems; typically the < ^. . . ^ 

data is represented in a very primitive structure, usually a 1^^. 4 is a diagram to which reference will be made m 

so-caUed fiat-file or a flat file and a collection of soK^alled explammg the computing of a city list from a plurality of 

inverted files based upon that flat file. (A "flat-file'' is a file store lists; 

of records without any extra structure imposed thereon. That FIG. 5 is a flow chart illustrating the steps performed in 

is, a fiat file may have no index structures to speed access to processing queries; and 

records withm the file. On the other hand, an inverted file is piG. 6 is a diagram iUustrating alternative ways in which 

an auxihary file based off of a main file but sorted on another ^^^^^ \^v^n^on may be incorporated. 

attnbute. For example, consider a situation wherem a base j r 

file has customer ID, store ID, and purchase amount which DETAILED DESCRIPTION OF THE 

are sorted in order of increasing aistomer ID number. In PREFERRED EMBODIMENTS 

such situation, an associated inverted file may be defined and 

populated that again has customer ID, store ID, and purchase Preferred embodiments of the present invention will now 

amount, but which are sorted by store ID. This inverted file be described with reference to the accompanying drawings. 

may facilitate queries such as "find aU purchases in store r« ^ ^^^^^„i « 

• 4U J f i_ • * 'I'T -11 L present mvention provides a two-level structure 

27 , smce the records for purchases m store 27 will be ♦ *u •*u • *u j *u * . *l * 

^« ^r.^.i^A •« tu* fii \ together with processing methods that operate on the struc- 

co-located in the inverted tile.) 20 . j * . T -i-* . • . j- 

Tr 11 J* • 1 J * 1. . L ture and mput data to faciUtate eflacient multi -dimensional 

Unlike multi-dimensional database systems, the above- ^ • c * ^^jj* * -n. % i 

„ #- J _ • 1 * L * ui analysisof customer-oriented data sets. The upper level may 

mentioned special system may somewhat enable operations . . . . , ^. t .t. ^ ^ ^ 

involving large detailed lists of customer behavior. be a multi-dimensional index, while the lower level may 

However, these systems also have a number of serious "^^^^^^ summarized lists correspondmg to the cells m the 
defects or disadvantages, 25 °iulti-dimensional space spanned by the index. In a pre- 

One disadvantage is that, unlike multi-dimensional data- ^^^^^ embodiment, the upper level is a search-tree structure 

base systems, the model embodied by these special purpose ^ "^^^^^ '^^y^" combmations of dimension 

or "list processing" systems may not be rich or complete elements corresponding to lists materialized in the lower 

enough to allow a large class of optimizations that improve ^^^^l- However, other techniques for data indexing 
performance, nor may it not be rich enough to facilitate a 3Q (including but not limited to multi-dimensional arrays, 

structured analysis of a data set. As such, a collection of B-trees, R-trees, quad-trees, hashing-based structures, and 

ad-hoc queries and result sets may be provided, with no clear so forth) may also be used. All of these techniques solve the 

relationship among them, no opportunities for system-based same general problem — they map from "keys" to "data 

optimal re-use of information, and no opportunities for values." For example, in a standard B-tree or hashing-based 

judicious prc-computation as a run-time query accelerator. stmcture over an employee record file, the "key" may be a 
That is, in most if not all decision support applications, ^ socialsecuritynumberof an employee, while the "data" may 

query-time performance may be improved by computing be the record for the employee. In the present appHcation, 

answers to common queries or sub-parts of sub-queries in the "key" may be a combination of dimension elements, 
advance or ahead of time. However, determining which "^ata" may be the corresponding fist, 

pre -computed results may be used to assist in answerinE . , . , . . . 

which queries may be difficult unless a formal structure to 40 As an example, consider a three-dimensional space or 
the system is provided whidi faciUtates such process. "^"^^ '^^ dimensions are product, store, and tune. 

Another disadvantage is that these models or tools may Fyfthe™"'^. «tuat»on. i«sume that the product 

not provide a multi-dimensional viewof data and/or may not dimension includes the elements shoes, shirts, and ties, that 
be integratable or usable with multi-dimensional data analy- dimension includes the elements east and west, and 

sis tools which are becoming the choice for business data set 45 dimension has the dates Jan. 1, 1996, Jan. 2, 

analysis. 1996, Jan. 3, 1996, and Jan. 4 1996. For ease of explanation, 

OBJECTS AND SUMMARY OF THE T'l ""k W«rar<*i«« on |he dimensions. 

INVENTION above situation, there are 24 distinct cells repre- 

,. . . senting all combinations ofdimension elements (3 products* 

An object of the present mvenUon is to provide a system 2 stores' 4 dates). In addition, there may be a number of ce 

which eliminates the above^escribed deficiencies of the corresponding to "projecting out" one or more of the dimen- 

currentOIAP and da abase marketmg tools. sions. For example, if time is omitted, product by store is 

More specifically, it is an object of the present invention obtained which is a sub-array of 6 ceUs; if store is omitted, 
to provide a system which combines the multi-dunensional j... . •,„(,,.•„ j ...i,- 1, • u m 11 

data analysis id interactive speed of an OLAP tool with the ^ ^^^''^^^.f ' J^^^^ ^ ^ ^"^^1^^ ^^.^^^ 

detailed customer list handling capabilities of a customer ^5 ^ date and store are omitted, a three cell sub-array having 
information system. P^^^^ ^ obtained; and so forth. 

Other objects, features and advantages according to the * °^ records corresponding to 

present invenUon will become apparent from the following individual customer behavior in that cell may be stored. In 
detailed description of illustrated embodiments when read in records, a customer or account identification (ID) or 

connection with the accompanying drawings in which cor- ^ fi^|d may be provided, which can be any number that 

responding components are identified by the same reference uniquely identifies a customer or account. Additionally, 

numerals, ^ere may be a number of user-selected fields corresponding 

T^^r^^ ^^r,™T™T^*T «^ ^ ^ information the user (or application developer) has 

BRIEF DESCRIPTION OF THE DRAWINGS ^^^^ i^p^^ant. For example, if the selected additional 

FIG. 1 is a diagram of an apparatus for storing, retrieving, 65 fields are total dollars ^ent, number of purchases, and rank 

and processing customer data, account data or the like within the cell, the records in the lists will have the format 

according to an embodiment of the present invention; customer ID, dollars spent, number of purchases, and rank. 



06/04/2004, EAST Version: 1.4.1 



us 6,11 

5 

Then, as an example, in a cell corresponding to (shoes, east, 
Jan. 1, 1996), a record would be stored for each customer 
who made at least one purchase of shoes in the cast store on 
Jan. 1, 1996. This record would have each such customer's 
id, the dollars the customer(s) spent, and the number of 
purchases the customer(s) made on shoes in the east store on 
Jan. 1, 1996- 

An apparatus for storing, retrieving, and processing cus- 
tomer data, account data and so forth is illustrated in FIG. 1. 
As shown therein, such apparatus 10 may include an input 
device 12, a processor 14, a memory 16, a display unit 18, 
and a printer 20, The memory may include a number of 
portions or areas such as area A and area B. Each of such 
areas may include a respective type of memory or storage. 
For example, area A may be a semiconductor memory and 
area B may be a disc-type storage or memory. 

With regard to the above-mentioned two types of storage, 
the disk storage (which may be referred to as an external 
memory) may have a relatively high storage capacity and 
may be relatively inexpensive. Although access to data 
stored in such disk storage may be relatively slow, access to 
such data may be improved if data is sequentially accessed 
as compared to randomly accessed. (In sequential accessing, 
consecutive accesses are to adjacent storage locations on the 
disk. Id random accessing, consecutive accesses may be to 
locations scattered throughout the disk.) Further, the semi- 
conductor memory (which may be referred to as a main 
memory) may be faster than the external memory. Although 
the semiconductor memory may have less capacity than the 
disk storage, the present invention efficiently utilizes both 
types of storage. 

Input data (such as cust omer data, account data, and t he 
l ike)'6i:' requests for aesirecTinformation (such as user query ) 
c nay be inputted to the input device 12 by an operato r, 
whereupon a signal correspondi ng t hereto may be generat ed 
and supplied to the processor i4 . The processor l4 is 
adapted to process the received data and/or to provide the 
requested information. That is, the processor 14 may receive 
data and/or request from the input device 12 and previously 
stored instructions from the memory 16A and, in accordance 
therewith, may process the received data (for example, 
translate the user query to structured query language (SQL)), 
store the processed data in the memory 16B, and/or cause 
the request information to be displayed on the display unit 
18 or printed by the printer 20 (for example, execute the 
translate user query in relational database system 
environment). Such operation of the processor 14 will be 
hereinafter more fully described. 

22 illustrates a model representing two products and three 
stores, wherein information indicative of the amount each 
customer spent on each product and/or in each store may 
have been supplied to the processor 14 by way of the input 
device 12 so as to be processed and/or stored in the memory 
16. Assume that an operator wishes to know the total amount 
each customer spent in each store, the total amoimt each 
customer spent on each product, and the total amount each 
customer spent on each product in each store. In this 
situation, the operator may input a command requesting 
such information by use of the input device 12. As a result, 
a corresponding request signal is supplied to the processor 
14, whereupon the appropriate processing and/or retrieval of 
data is performed and the requested information is supplied 
to the display 18 so as to be displayed thereat and/or the 
printer 20 so as to obtain a printed copy of such information. 
Based upon the operator's request, eleven "lists" may be 
obtained. Six of such lists are illustrated 24-34 That is, lists 
for Sto2, for Sta3, Prodi, for Prod2, and for combinations 
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(Prod2, Stol) and (Prodl,Sto3) are respectively shown 
24-34. For clarity, five of the lists have been omitted from 
the picture. The fields stored in the lists are ID, percentile 
within list, and dollar amount within list. So, for example, 

5 with respect to product 1 in store 3 as shown in the list for 
(Prodi, Sto3) 34, customer 101 is at the 25** percentile and 
has made purchases totaling $37. Similarly, with respect to 
product 1 as shown in the list for Prod 128, customer 101 is 
in the 56''* percentile with purchases of S500. 

jQ The present invention enables cfl&cicnt and intuitive 
multi-dimensional access to customer-level data. That is, 
one merely specifies the cell or set of cells of interest; by 
using the upper-level structure (the multi-dimensional 
index) one is directed to the appropriate list; and in the list, 
the information about all customer behavior relevant to that 
cell is immediately available by a simple scan of the list. The 
information in the list may be summarized by customer 
specifically for the cell in which it resides. Such arrangement 
is preferable, since any more highly siunmarized informa- 

2Q tion may result in information being discarded or lost, while 
any less highly summarized information may increase the 
data size and slow retrieval and subsequent processing. 
Furthermore, since the list may only contain records of 
customers who actually made purchases in the cell in 

25 question, the number of entries in such a list may be 
substantially less than and, in fact, may be orders of mag- 
nitude less than the number of customers and transactions in 
the total data set. Thus, a query on a ^ecific cell may 
examine a relatively small set or the minimal set of data 

3Q necessary to answer the query. 

To effectively utilize the storage structu re or capability, 
t he present invention enables th ^ists for p opuiatmg the 
cells of th e multi-dim e nsional space to be efBciently ^ ener- 
ateci l| pja a inp ut data and enables operati ons to be performed 

35 onejdstmg l ists to gencraTc n ew ones in response to user 
queries. T he abilit y to ctlicientiv gcneraie ih e lists isuseful 
' because lists may be generated in a number of sit uation s 
dUrmg the operation ot the present invcntiOD. l^iB t Tlists may 
b e generated to init ially populate th e d atabase . S econdTas 

40 mentio ned'prevfiiusly, in iargc multi- dimensional models it 
may be unteasible to populate all ccUs ot the jau lti- 
^^ncaai onal space, in such situations, a preierrcd proced ure 
may be to initially populate a subset ot ihe cells anj to 
generate lists for other ceils o n an "as iie&dcd" basis. lEird, ' 

45 during a user analysis session or the iikti, users hiay ask 
q uenes which recjuire e xistmg lists to D exomomea so as to 
generate a new list th at constitutes t he answer to tEc query. 

In customer information database apphcations, the initial 
data set may come from a number of sources, including 

50 "operational data" which pertains to the customer's 
transactions, and demographic data which may describe 
demographic information pertaining to the customers which 
may not be related to their transactions. For example, the 
operational data may include a large transaction file con- 

55 taining information for each transaction by each customer, 
and the demographic data may be obtained £rom information 
provided by the customers (such as when their accounts 
were initially established) and/or may be purdiascd from a 
third-party provider. 

60 Prior to generating the lists for populating the n^u lti- 
dimensiona l space, the cu stomer or account numbers shou ld 
be standaroizea kHd assigned lor all sources ot informati on 
such that eadi customer or accou nt has a uniqu e identifier . 
Thereafter, the input files are to be solltiU based upon^is 

65 identifier, such as standardized customer/ account number(s) . 
The order established in this sort is preferably maintained 
throughout the operation of the present invention. Such 
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order Esay allow operations to be performed efiSciently 
which may otherwise result in more lengthy computation 
and/or need more computer memory. 

To see how the abovc-mentioned order is utilized in the 
present invention, data pertaining to records in the lists may 
be arranged into two \iser-specified fields, that is, an intra- 
customer field and an inter-customer field. An intra- 
customer field(s) may be computed for a respective cus- 
tomer by examining base -level records of that customer in 
isolation. For example, to count the number of purchases 
made by such respective customer in a partiadar store, only 
the records for that customer are examined. In this example, 
the behavior or records of other customers are irrelevant. On 
the other hand, inter-customer fields may utilize information 
from a number of intra-customer fields of other customers in 
the cell. For example, consider the situation wherein it is 
desired to store or obtain a field having the rank of customer 
in a given cell with regard to a predetermined criteria, such 
as the ranking with regard to total purchases made. (In this 
situation, a rank of 1 may indicate the customer having made 
the most purchases, a rank of 2 may indicate the customer 
having made the second most purchases, and so forth.) As is 
to be appreciated, in this situation, purchases of other 
customers in that cell are utilized or examined to determine 
the rank of the desired customer. 

ITie present invention provides an efl&cient mechanism for 
computing both types of fields (that is, intra-fields and 
inter-fields). As an example, consider the situation wherein 
one relatively large input file of customer transactions exists 
in which each record in this file contains the customer's 
account number, the store ID for the store in which the 
purchase was made, the product that was purchased, the date 
upon which the purchase was made, and the dollar amount 
for the transaction. Suppose, in this situation, that the list for 
the "shoes" cell is desired wherein the records in this list 
have fields for the customer ID, total sales, and rank. As a 
result, a record may be obtained for a customer (such as 
customer #270,567) having the total purchase amount for all 
shoes bought by customer #270,567, and the rank of this 
customer (which indicates whether this customer purchased 
the most shoes, the second most, and so forth). 

Hie cell list for "shoes" may be generated in a single pass 
over the input transaction file, which has been sorted by 
customer ID, As each new customer's records arc encoun- 
tered (all such records are adjacent), the apparatus 10 
extracts records which pertain to the purchases of shoes. The 
apparatus 10 may sum the purchases or sales for such 
records, and retain or store the record for this customer in the 
list for the "shoes" cell in the memory 16. If other intra- 
customer fields have been specified or requested for this list, 
they may be computed or obtained in the same pass. At this 
point, all inter-customer fields may be blank. Upon com- 
pleting the scan of the input transaction file, the entire list for 
the "shoes" cell may be computed or obtained such that the 
intra-customer fields may be correctly computed and the 
inter-customer fields may be empty or blank. 

Hierefore, the above-described results pertaining to the 
cell list for "shoes" may be obtained with only a single pass 
of die transaction file. Further, the records of the list may be 
generated in the correct order without additional sorting. 
Furthermore, the capacity of the main memory may be such 
so as to hold the input transaction records of a single 
customer and the obtained record, and the reading and 
writing of data from/to the external memory may be per- 
formed in a "sequential" manner. As such, the main memory 
and external memory may be efficiently utilized. 

Although in the above description a single list was 
obtained from a single pass of the input file, the present 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



apparatus is not so limited. That is, the apparatus 10 may 
enable a pl\irality of Usts to be computed or obtained for a 
plurality of cells from a single pass of the input file. At the 
end of this pass, the apparatus 10 may have computed lists 
for all cells such that the intra-customer fields have been 
filled in or computed and the inter-customer fields have been 
left empty or blank. In this situation, the capacity of the 
memory 16 is such so as to hold the input transaction file and 
customer records for each list to be gpnerated. 

The apparatus 10 may also handle or process multiple 
input data sources or files. That is, after the input files have 
been sorted in accordance with a predetermined sorting 
criteria (such as based on customer ID), the lists may be 
generated with a single, synchronized "merge" pass through 
the sorted input files. During this processing, the apparatus 
10 may consider each account or customer number in turn 
while stepping through the input files. Since the input files 
are sorted, the information for computing the intra-customer 
records is available. Additionally, in this situation, the 
memory capacity for the memory 16 and the amount of 
computation performed by the processor 14 may be minimal 
or relatively small. 

With regard to the inter-customer fields, the apparatus 10 
may perform a second pass over the partially fiUed-in lists so 
as to compute or obtain the inter-field data. Since inter- 
customer fields in a given cell may depend only upon the 
intra-customer fields of the records in the cell, the apparatus 
10 may be able to perform such processing without referring 
back to input data or to records in lists of other cells. Since 
the lists in the cells may be substantially smaller than the 
input data set, such procedure may be efficient. As an 
example of such inter-customer field processing, the "per- 
centile" field within a cell (of the above-described example) 
may be computed by sorting the records in the list on sales. 

Thus, to compute or obtain the initial lists for the cells, the 
apparatus 10 may utilize a two-phase (two-pass) procedure 
wherein the intra-customer fields may be computed during a 
first pass over the input data files and the inter-customer 
fields may be computed during a second pass over the 
partially fiUed-in lists. As previously described, such proce- 
dure results in an efficient use of computation and storage. 

The above-described procedure for obtaining the initial 
list(s) is outlined in the flow chart of FIG. 3. As shown 
therein, processing may be initiated at step SI wherein a user 
may define a model and supply the same to the apparatus 10. 
In defining the model, the user may indicate the dimensions 
with hierarchies, may specify the intra-and inter-customer 
fields which are to be stored in the lists, and may specify the 
lists which are to be initially obtained. At step SI, the user 
may also map the model to the base or input data. 

Processing may then proceed to step S2 wherein a deter- 
mination is made as to whether the base files have been 
sorted. If the determination in step S2 is negative, processing 
proceeds to step S5 wherein the files are sorted in a accor- 
dance with a predetermined or desired manner. Thereafter, 
processing proceeds to step S3. If, however, the determina- 
tion in step S2 is affirmative, processing proceeds direcdy to 
step S3. 

At step S3, the input data may be scanned wherein the 
information for each customer may be read and suimmarized 
for the intra-customer fields for the specified base lists. 
Thereafter, processing may proceed to step S4. 

At step S4, the lists generated in step S3 may be scanned 
so as to obtain the inter-customer fields. 

Next, a procedure for computing new lists from existing 
lists at query time will be described. 
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As preAdously indicated, if the number of cells in a lists, along with filters on the inter-custoincr and intra- 

multi-dimcnsional space is too large, one may choose not to customer fields in these lists. These filters may be compara- 

storc or materialize all of these cells. As an example, and tors or predicatives on the field(s) of the records in the lists, 

with reference to the above-described example, one may For example, in determining purchases greater than S20.00 
choose to only materialize lists for purdiases by store, 5 a filter or comparator may be used. Such query may be 

purchases by product, and/or purchases by dale. However, if answered by performing a Unear scan of the appropriate lists 

only such lists arc materialized, one may be unable to ^nd performing computations (such as comparison 

answer queries on cells having lists vjich have not been computations) which are linear or proportional in number to 

matenalized. In other words one may be ijnablc to answer ^ ^^at a relatively small or minimal 

a question regarding a nol^yet-computed cell which is higher ^^^^^ ^ ^^^^ ^ j ^^^^ 

m he hierarchical chain. For example, al^ough one may be ^ single customer) ^ utilized, 

able to answer a question regarding purchases by store, one ^'-"""•^ wiioium^.^ uiux^^. 

may be unable to answer a question regarding purchases by ^ lUustrates a number of ways m which the present 

city in the above example. invention may be incorporated. However, the present inven- 

TliejTcsentapparatus may enable the above q uestion ^ not Umited to these ways and may be incorporated in 

regarding purcE aSCt; hy uiy to be answered by -ptiKzm gT ^ number of other ways. 

^erarchicai mum-dimen sional model and by sonmg the lists Thus, the present invention enables data to be retrieved 

i n the cells. More sfjedflcally, 10 compute ihe purcnases py and/or processed in a relatively high efficient manner. 

a city, the apparatus lU may id en^jf^gg^ eiis corresg jondL^ Additionally, by storing and processing data in a multi- 

fo^saies by store lor eacn store witm n the city and tlTen dimensional arrangement, the present invention facilitates 



to gales by store lor e^ggsiore within the city a nd then dimensional arrangement, the i 

5)mpute ine intra -customijr lib!fe~xn~tnEn»iw hsi^ b^ er- 20 thereof by an operator, 

forming a so-called synchronized m crge-scan ot the exisUn g ^he present invention may be embodied in a number of 

hgsTT&at IS the apparatus lU may mamtain m ine memory ^^^^ ^^^^ ^^^^ .^^^^^^ ^ ^ 

1 6 tne purchases ot a customer m each store m ihe cliy." Ine / j- . ^ a r 

" f J ■ n: 1: — . stnicturcs stored m computer memory and procedures for 

apparatus may combme or merge tnese purcnases so as to . ^ f - • 

c ompute the purchases tor the customer in the ciiy and form 25 °° ^^"^ stnictures. The present mvenUon is 

that customer's "city" record. After this merge, the list for P^^fcrably implemented usmg software and an apparatus 

the city may be in the proper sorted order and may be havmg a memory and a processor (such as the apparatus 10). 

complete except for the inter-customer fields. Such fields However, the present mvention could also be unplemented 

may be computed by a second scan of the new city list. As utilizing hardware. 

with populating the initial list, this processing utilizes a ^ Further, in addition to the arrangement described above, 

minimal or relatively small amount of computation process- a number of alternatives for the upper-level index and the 

ing and memory capacity or storage. lower-level lists may be utilized. For example, the upper- 

Thc above-described procedure for obtaining purchases level index could be embodied by a search-tree structure, in 
by city is butlined in the flow chart of HG. 4. As shown which the *1ceys" (corresponding to cells populated with 
therein, lists for stores in the city representing purchases 35 dimension elements) are names of the lists (such as a table 
made by customers may be obtained at step SIO. Processing °ame, a file name, etc.) in the multi-dimensional space. Such 
may then proceed to step S20 wherein such purchases may search-tree structure may be a B-tree, the directory structure 
be combined or merged so as to compute the purchases for employed by a computer operating system in its file man- 
the custom6r(s) in the city or intra-customer fields. The agement structure, or a system catalog of a relational data- 
resulting city list, which has the inter-customer fields blank, 40 system. The lower-level lists may be stored in the files 
may be supplied to step S30 wherein the inter-customer of an operating system. Alternatively, such lists may be 
fields may be computed. Thereafter, the city list having the stored as tables in a relational database system, where each 
inter-customer fields is available. ^^^^ correspond to one or a combination of "keys*' of 

Further, during a query session, the present apparatus the upper-level index structure. This latter technique may be 

enables lists to be materialized which may not be part of the 45 Preferable for the situation wherein relational database sys- 

multi-dimensbnal space. As an example, the apparatus 10 catalogs are used to store the multi-dimensional index, 

enables a list to be materialized for all customers who have Furthermore, although the present invention was prima- 

spent more than $20 on socks and have not yet purchased rily described with reference to "customers", the present 

shoes. To perform such operation, the apparatus 10 performs invention is not so hmited. That is, the present invention 

a synchronized "merge" pass through the list for socks and 50 may be applied to other data items which may have a large 

the list for shoes. That is, as each customer in the sock list number of entries such as accounts, so-called register ring 

is encountered, the apparatus 10 may determine if the sock numbers, or the like. 

purchase total exceeds $20, and if so, the apparatus 10 may Although preferred embodiments of the present invention 

further determine if the respective customer does not appear and modifications thereof have been described in detail 

in the shoe list. If both determinations are found to be true, ss herein, it is to be understood that this invention is not limited 

a record or indication of the respective customer may be to these embodiments and modifications, and that other 

added to the new list. Such procedure is continued until modifications and variations may be effected by one skilled 

reaching the end of the sock list. Thus, this procedure may in the art without departing firom the spirit and scope of the 

be performed with only a single scan of both the socks and invention as defined by the appended claims, 

shoes lists, and the number of comparisons made during the 60 What is claimed is: 

determinations is hnear or proportional to the number of 1. A method for obtaining desired information firom data 

records in these lists. Such procedure enables an optimal use representative of a number of data items, comprising the 

of computer hardware. The steps whidi may be performed steps of: 

by the apparatus 10 in processing queries is illustrated in the storing in a first memory portion of a computer memory 

flow chart shown in FIG. 5. 65 a plurahty ot CfcUi; Utihlg r eptesculed as a multi- 

Accordingly, the present apparatus may facilitate a query dimensional storage struct ure tha t is aeimed by a 

which involves a Boolean (and, or, and-not) combination of plurahty of dimensions, wherein each oi said dinaen-^ 
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^nns ipfl'^f^p-^^-a-pIm ality of members for idePtifYiiig 
des yed informatio n, cach^f said cel ls cofrg^o^d - 
'Tng to aTlEaSrone oi said members; 

storing in a secona memory pomon oi said computer 
memory a plurality of data lists, each of said data lists 
comprising a plurality of records for storing said 
desired information; 

linking each said cell with a respective data list such that 
each re^ective cell represents a miilti-dimensional 
index to the corresponding data list, whereby each said 
data list is identified by said at least one of said 
members; and 

wherein each said data item corresponds to a customer 
such that said desired information selectively repre- 
sents at least one transaction carried out by said cus- 
tomer or demographic data about said customer, said 
one transaction or demographic data being selectively 
defined by said at least one of said members. 

2. The method according to claim 1, further comprising 
selectively storing in each of said records a first field 
corresponding to intra-data item summary information 
which is obtained by processing said desired information for 
the corresponding data item without referring to other data 
items in a respective data list, and selectively storing in each 
said record a second field corresponding to inler-data item 
summary information which is obtained by processing said 
desired information for the corresponding data item by 
referring to said other data items in said respective data list. 

3. The method according to claim 1, wherein each said 
cell is linked with the corresponding data list via a pointer 

4. The method according to claim 3, wherein said plurality 
of cells in said first memory portion is arranged as a file 
system directory, each said cell being represented by a file 
name, each said data list in said second memory portion 
being represented by a file, said pointer being said file name. 

5. The method according to claim 3, wherein said plurality 
of cells in said first memory portion is arranged as a 
relational database, at least one of said cells being repre- 
sented by a table name, eadi said data list in said second 
memory portion being represented by a table, and said 
pointer being represented by said table name. 

6. The method according to claim 3, wherein said plurality 
of cells in said first memory portion is arranged as one of a 
B-tree, a quad-tree, an R-tree, and an array, wherein each 
said data list is represented by a file. 

7. The method according to claim 1, wherein said respec- 
tive data list is identified by a combination of said members 
from different dimensions. 

8. A method for generating a plurality of data lists in a 
multi- dimensional storage structure for staring desired infor- 
mation obtained from data representative of data items, said 
desired information having been stored in a first memory 
portion of a computer memory as a plurality of cells being 
represented as said midti-dimensional storage structure that 
is defined by a plurality of dimensions, wherein each of said 
dimensions includes a plurality of members for identifying 
said desired information, each of said cells corresponding to 
at least one of said members, said desired information 
having been further stored in a second memory portion of 
said computer memory as a plurality of data lists, each of 
said data lists comprising a plurality of records for storing 
said desired information and an identifier for each said data 
item, said method comprising the steps of: 

sorting said data by using said identifier; 
extracting from the sorted data intra-data item summary 
information which is obtained by processing said 
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desired information for each corresponding data item 
without referring to other data items in a respective data 
list; 

supplying the extracted intra-data item summary infor- 
mation to respective records in each said data list; 

accessing each said data list for determining intcr-data 
item summary information which is obtained by pro- 
cessing the extracted intra-data item summary infor- 
mation for the corre^onding data item by referring to 
said other data items in said respective data list; 

supplying said inter-data item summary information to 
said respective records; 

linking each said cell with said respective data list such 
that each said cell represents a multi-dimensional index 
to each said data list, whereby each said data list is 
identified by said at least one of said members; and 

wherein each said data item corresponds to a customer 
such that said data selectively represents at least one 
said transaction carried out by said customer, said one 
transaction or demographic data being selectively 
defined by said at least one of said members. 

9. The method according to claim 8, wherein said respec- 
tive data list is identified by a combination of said members 
from different dimensions. 

10. The method according to claim 8, further comprising 
generating said data lists for a subset of said cells. 

11. The method according to claim 10, further comprising 
performing Boolean operations on said data lists to generate 
a new list corresponding to a respective cell. 

12. The method according to claim 11, wherein said 
dimensions are represented by a plurality of hierarchical 
orders, said new list being higher in a hierarchical order 
corresponding to a preselected dimension than the previ- 
ously generated data lists conesponding to said preselected 
dimension. 

13. The method according to claim 8, further comprising 
filtering said respective records in said data lists to generate 
a new list which does not correspond to any one of said cells. 

14. An apparatus for obtaining desired information from 
data representative of a number of data items, comprising: 

means for storing in a first memory portion of a computer 
memory a plurality of cells being represented as a 
multi-dimensional storage structure that is defined by a 
plurality of dimensions, wherein each of said dimen- 
sions includes a plurality of members for identifying 
said desired information, eadi of said cells correspond- 
ing to at least one of said members; 

means for storing in a second memory portion of said 
computer memory a plurality of data lists, each of said 
data lists comprising a plurality of records for storing 
said desired information; 

means for linking each said cell with a respective data list 
such that each respective cell represents a multi- 
dimensional index to the corresponding data list, 
whereby each said data list is identified by said at least 
one of said members; and 

wherein each said data item oonesponds to a customer 
such that said data selectively represents at least one 
transaction carried out by said customer or demo- 
graphic data about said customer, said one transaction 
or demographic data being selectively defined by said 
at least one of said members. 

15. The apparatus according to claim 14, further com- 
prising means for selectively storing in each of said records 
a first field corresponding to intra-data item summary infor- 
mation which is obtained by processing said desired infor- 
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mation for each corresponding data item without referring to 
other data items in said respective data list, and means for 
selectively storing in each said record a second field corre- 
sponding to inter-data item summary information which is 
obtained by processing said desired information for the 
corresponding data item by referring to said other data items 
in said respective data list. 

16. The apparatus according to claim 14, wherein each 
said cell is linked with the corresponding data list via a 
pointer, 

17. The apparatus according to claim 16, wherein said 
plurality of cells in said first memory portion is arranged as 
a file system directory, each said cell being represented by 
a file name, each said data list in said second memory 
portion being represented by a file, and said pointer being 
said file name. 

18. The apparatus according to claim 16, wherein said 
plurality of cells in said first memory portion is arranged as 
a relational database, at least one of said cells being repre- 
sented by a table name, each said data list in said second 
memory portion being represented by a table, said pointer 
being represented by said table name. 

19. The apparatus according to claim 16, wherein said 
plurality of cells in said first memory portion is arranged as 
one of a B-tree, a quad-tree, an R-tree, and an array, each 
said data list being represented by a file. 

20. The apparatus according to claim 14, wherein said 
respective data list is identified by a combination of said 
members from different dimensions. 

21. An apparatus for generating, in response to a user 
query, at least one data list in a multi-dimensional storage 
structure for storing desired information obtained from data 
representative of data items, said desired information being 
stored in a first memory portion of a computer memory as a 
plurality of cells being represented as said multi- 
dimensional storage strucmre that is defined by a plurality of 
dimensions, wherein each of said dimensions includes a 
plurality of members for identifying said desired 
information, each of said cells corresponding to at least one 
of said members, said desired information being further 
stored in a second memory portion of said computer 
memory as a plurality of data lists, each of said data lists 
comprising a pliu-ality of records for storing said desired 
information and an identifier for each said data item, said 
apparatus comprising: 

means for sorting said data by using said identifier; 

means for extracting from ±6 sorted data intra-data item 
summary information which is obtained by processing 
said desired information for each corresponding data 
item without referring to other data items in a respec- 
tive data list; 

means for supplying the extracted intra-data item sum- 
mary information to respective records in each said list; 

means for supplying the extracted intra-data item sum- 
mary to respective records in each said data list; 

means for accessing each said data list for determining 
inter-data item summary information which is obtained 
by processing the extracted intra-data item summary 
information for the corresponding data item by refer- 
ring to said other data items in said respective data list; go 

means for supplying said inter-data item summary infor- 
mation to said re^ective records; 

means for linking each said cell with said respective data 
list such that each said cell represents a multi- 
dimensional index to the corresponding data list 65 
whereby each said data list is identified by said at least 
one of said members; 



35 



45 



50 



55 



means for outputting said data lists, and 

wherein each said data item corresponds to a customer 
such that said data selectively represents at least one 
transaction carried out by said customer or demo- 
graphic data about said customer, said one transaction 
or demographic data being selectively defined by said 
at least one of said members. 

22. The apparatus according to claim 21, wherein said 
respective data list is identified by a combination of said 
members from different dimensions. 

23. The apparatus according to claim 21, further com- 
prising means for generating said data lists for a subset of 
said cells. 

24. The apparatus according to claim 23, further com- 
prising means for performing Boolean operations on said 
data lists to generate a new data list corresponding to a 
respective cell. 

25. The apparatus according to claim 24, wherein said 
dimensions are represented by a plurality of hierarchical 
orders, said new data list being higher in a hierarchical order 
corresponding to a preselected dimension than the previ- 
ously generated data lists coaesponding to said preselected 
dimension. 

26. The apparatus according to claim 21, further com- 
prising means for filtering said respective records in said 
data lists to generate a new data list which does not corre- 
spond to any one of said cells. 

27. The apparatus according to claim 21, wherein said 
means for outputting is one of a display device and a printer. 

28. The apparatus according to claim 21, further com- 
prising means for translating said user query to structured 
query language (SQL) and means for executing the trans- 
lated user query in relational database system environment. 

29. An apparatus for generating, in response to a user 
query, at least one data list in a multi-dimensional storage 
structure for storing desired information obtained from data 
representative of data items, said apparatus comprising: 

means for storing said desired information as a plurality 
of cells being represented as said multi-dimensional 
storage structure that is defined by a plurality of 
dimensions, wherein each of said dimensions includes 
a plurality of members for identifying said desired 
information, each of said cells corresponding to at least 
one of said members; 

means for further storing said desired information as a 
plurality of data Usts, each of said data lists comprising 
a plurality of records for storing said desired informa- 
tion and an identifier for each said data item; 

means for sorting said data by using said identifier; 

means for extracting from the sorted data intra-data item 
summary information which is obtained by processing 
said desired information for each corresponding data 
item without referring to other data items in a respec- 
tive data list; 

means for supplying the extracted intra-data item sum- 
mary information to respective records in each said 
data list; 

means for accessing each said data list for determining 
inter-data item summary information which is obtained 
by processing the extracted intra-data item summary 
information for the corresponding data item by refer- 
ring to said other data items in said respective data list; 

means for supplying said inter-data item summary infor- 
mation to said respective records; 

means for linking each said cell with said respective data 
list such that each said cell represents a muld- 
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dimensional index to the corresponding data list, 
whereby each said data list is identified by said at least 
one of said members; and 
means for outputting said data lists. 
30. A method for generating, in response to a user query, ^ 
at least one data list in a multi-dimensional storage structure 
for storing desired information obtained from data repre- 
sentative of data items, said method comprising the steps of; 
storing said desired information as a pltirality of cells 
being represented as said multi-dimensional storage 
structure that is defined by a plurality of dimensions, 
wherein each of said dimensions includes a plurality of 
members for identifying said desired information, each 
of said cells corresponding to at least one of said 
members; 

storing said desired information as a plurality of data lists, 
each of said data lists comprising a plurality of records 
for storing said desired information and an identifier for 
each said data item; ^ 

sorting said data by using said identifier; 

extracting from the sorted data intra-data item summary 
information which is obtained by processing said 
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desired information for each corresponding data item 
without referring to other data items in a respective data 
list; 

supplying the extracted intra-data item summary infor- 
mation to respective records in each said data list; 

accessing each said data list for determining inter-data 
item summary information which is obtained by pro- 
cessing the extracted intra-data item summary infor- 
mation for the corresponding data item by referring to 
said other data items in said respective data list; 

supplying said inter-data item summary information to 
said respective records; 

linking each said cell with said respective data list such 
that each said cell represents a multi-dimensional index 
to the corresponding data hst, whereby each said data 
list is identified by said at least one of said members; 
and 

outputting said data lists. 
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